Skip to content

[v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto#62621

Open
semimikoh wants to merge 2 commits intonodejs:v22.x-stagingfrom
semimikoh:fix-encodeinto-intmax
Open

[v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto#62621
semimikoh wants to merge 2 commits intonodejs:v22.x-stagingfrom
semimikoh:fix-encodeinto-intmax

Conversation

@semimikoh
Copy link
Copy Markdown
Contributor

Backport notice

This is a v22.x-only patch, not a cherry-pick. The bug it fixes was
incidentally resolved on main / v24.x by #58070, which migrated
EncodeInto from v8::String::WriteUtf8 to WriteUtf8V2. The
intent of #58070 was deprecation cleanup, not bug fixing.

WriteUtf8V2 is not available in v22.x's bundled V8 version, so
backporting #58070 in full is not feasible. This PR instead applies
a minimal scoped fix to EncodeInto only.

Problem

In TextEncoder.encodeInto, the destination buffer's byte length
is read as size_t but then implicitly narrowed to int when
passed as the capacity argument to v8::String::WriteUtf8. When
the destination view is larger than INT_MAX (2,147,483,647
bytes), the narrowing conversion underflows to a negative value.
V8 treats this as "no capacity available" and writes 0 bytes,
returning { read: 0, written: 0 } even though the buffer has
ample space.

Reproduction on v22.x (from #62610):

```js
const _TE = new TextEncoder();
const s = "aÿ我𝑒";
const u8a = new Uint8Array(2429682061); // ~2.4 GB
const offset = 38928786;

_TE.encodeInto(s, u8a.subarray(offset));
// v22.x: { read: 0, written: 0 } ← bug
// v24.x: { read: 5, written: 10 } ← correct

_TE.encodeInto(s, u8a.subarray(offset, offset + 10));
// Both versions: { read: 5, written: 10 }
// (works because the bounded view is only 10 bytes, well within int range)
```

This is a silent data loss bug — no error is thrown, the caller
just gets a successful-looking return value with zero bytes written.

Fix

Clamp the capacity passed to WriteUtf8 to INT_MAX in
EncodeInto. The source string in encodeInto is bounded in
practice and never requires more than INT_MAX bytes to encode;
only the destination view length can exceed INT_MAX.

Testing

Built locally on v22.x-staging (v22.22.2-pre) and verified:

  • ✅ Original reproduction from TextEncoder.encodeInto NOT work on BIG utf8 subarray ON 22.4.1 #62610 now returns { read: 5, written: 10 } for both unbounded and bounded subarrays
  • ✅ New pummel regression test (test/pummel/test-whatwg-encoding-encodeinto-large.js) passes with the patched binary. Follows existing pummel conventions: skips on 32-bit architectures via common.skipIf32Bits(), skips on insufficient os.totalmem(), and skips on allocation failure
  • test/parallel/test-whatwg-encoding*, test-textencoder*, test-textdecoder* — 11 tests pass
  • test/parallel/test-buffer-*, test-string-decoder* — 69 tests pass
  • test/pummel/test-whatwg-encoding* — passes
  • cpplint.py on the modified file — clean
  • make lint-js — clean

Refs: #58070
Fixes: #62610

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. labels Apr 7, 2026
Comment thread test/pummel/test-whatwg-encoding-encodeinto-large.js Outdated
Comment thread test/pummel/test-whatwg-encoding-encodeinto-large.js Outdated
Comment thread src/encoding_binding.cc Outdated
Comment thread test/pummel/test-whatwg-encoding-encodeinto-large.js Outdated
@Renegade334
Copy link
Copy Markdown
Member

Please also ensure your commits are signed off.

@Renegade334 Renegade334 changed the title [v22.x backport] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto [v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto Apr 7, 2026
Copy link
Copy Markdown
Member

@Renegade334 Renegade334 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM, I think the tests could do with being simplified though.

Comment thread test/parallel/test-whatwg-encoding-encodeinto-large.js Outdated
Comment thread test/parallel/test-whatwg-encoding-encodeinto-large.js Outdated
Comment thread test/parallel/test-whatwg-encoding-encodeinto-large.js Outdated
In TextEncoder.encodeInto, the destination buffer's byte length is
read as a size_t but then implicitly narrowed to int when passed as
the capacity argument to v8::String::WriteUtf8. When the destination
view is larger than INT_MAX (2,147,483,647 bytes), the narrowing
conversion underflows to a negative value, V8 treats it as "no
capacity", and writes 0 bytes - returning { read: 0, written: 0 }
even though the buffer has plenty of room.

Clamp the capacity to INT_MAX before passing it to WriteUtf8. This
is sufficient because the source string in encodeInto is bounded in
practice and never requires more than INT_MAX bytes to encode; only
the destination view length can exceed INT_MAX.

This issue is already fixed on main and v24.x as a side effect of
PR nodejs#58070, which migrated to the non-deprecated WriteUtf8V2 method
whose capacity parameter is size_t. WriteUtf8V2 is not available in
v22.x's V8 version, so this minimal patch fixes only the EncodeInto
path instead of backporting the full migration.

Refs: nodejs#58070
Fixes: nodejs#62610
Signed-off-by: semimikoh <ejffjeosms@gmail.com>
Drop the bounded subarray case, remove the unnecessary error code
check, and use a 2**31 byte Uint8Array directly instead of slicing
an offset subarray.

Signed-off-by: semimikoh <ejffjeosms@gmail.com>
@semimikoh semimikoh force-pushed the fix-encodeinto-intmax branch from bb48d92 to b1de92b Compare April 7, 2026 22:50
Copy link
Copy Markdown
Member

@Renegade334 Renegade334 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Renegade334 Renegade334 added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. request-ci Add this label to start a Jenkins CI on a PR. labels Apr 13, 2026
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 13, 2026
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants